18 research outputs found

    Models of Visual Attention in Deep Residual CNNs

    Get PDF
    Feature reuse from earlier layers in neural network hierarchies has been shown to improve the quality of features at a later stage - a concept known as residual learning. In this thesis, we learn effective residual learning methodologies infused with attention mechanisms to observe their effect on different tasks. To this end, we propose 3 architectures across medical image segmentation and 3D point cloud analysis. In FocusNet, we propose an attention based dual branch encoder decoder structure that learns an extremely efficient attention mechanism which achieves state of the art results on the ISIC 2017 skin cancer segmentation dataset. We propose a novel loss enhancement that improves the convergence of FocusNet, performing better than state-of-the-art loss functions such as tversky and focal loss. Evaluations of the architecture proposes two drawbacks which we fix in FocusNetAlpha. Our novel residual group attention block based network forms the backbone of this architecture, learning distinct features with sparse correlations, which is the key reason for its effectiveness. At the time of writing this thesis, FocusNetAlpha outperforms all state-of-the-art convolutional autoencoders with the least parameters and FLOPs compared to them, based on our experiments on the ISIC 2018, DRIVE retinal vessel segmentation and the cell nuclei segmentation dataset. We then shift our attention to 3D point cloud processing where we propose SAWNet, which combines global and local point embeddings infused with attention, to create a spatially aware embedding that outperforms both. We propose a novel method to learn a global feature aggregation for point clouds via a fully differential block that does not need a lot of trainable parameters and gives obvious performance boosts. SAWNet beats state-of-the-art results on ModelNet40 and ShapeNet part segmentation datasets

    FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation

    Get PDF
    We propose a novel technique to incorporate attention within convolutional neural networks using feature maps generated by a separate convolutional autoencoder. Our attention architecture is well suited for incorporation with deep convolutional networks. We evaluate our model on benchmark segmentation datasets in skin cancer segmentation and lung lesion segmentation. Results show highly competitive performance when compared with U-Net and it's residual variant

    FocusNet++: Attentive Aggregated Transformations for Efficient and Accurate Medical Image Segmentation

    Get PDF
    We propose a new residual block for convolutional neural networks and demonstrate its state-of-the-art performance in medical image segmentation. We combine attention mechanisms with group convolutions to create our group attention mechanism, which forms the fundamental building block of our network, FocusNet++. We employ a hybrid loss based on balanced cross entropy, Tversky loss and the adaptive logarithmic loss to enhance the performance along with fast convergence. Our results show that FocusNet++ achieves state-of-the-art results across various benchmark metrics for the ISIC 2018 melanoma segmentation and the cell nuclei segmentation datasets with fewer parameters and FLOPs.Comment: Published at ISBI 202

    Penalizing small errors using an Adaptive Logarithmic Loss

    Get PDF
    Loss functions are error metrics that quantify the difference between a prediction and its corresponding ground truth. Fundamentally, they define a functional landscape for traversal by gradient descent. Although numerous loss functions have been proposed to date in order to handle various machine learning problems, little attention has been given to enhancing these functions to better traverse the loss landscape. In this paper, we simultaneously and significantly mitigate two prominent problems in medical image segmentation namely: i) class imbalance between foreground and background pixels and ii) poor loss function convergence. To this end, we propose an adaptive logarithmic loss function. We compare this loss function with the existing state-of-the-art on the ISIC 2018 dataset, the nuclei segmentation dataset as well as the DRIVE retinal vessel segmentation dataset. We measure the performance of our methodology on benchmark metrics and demonstrate state-of-the-art performance. More generally, we show that our system can be used as a framework for better training of deep neural networks

    FatNet: feature-attentive network for 3D point cloud processing

    Get PDF
    The application of deep learning to 3D point clouds is challenging due to its lack of order. Inspired by the point embeddings of PointNet and the edge embeddings of DGCNNs, we propose three improvements to the task of point cloud analysis. First, we introduce a novel feature-attentive neural network layer, a FAT layer, that combines both global point-based features and local edge-based features in order to generate better embeddings. Second, we find that applying the same attention mechanism across two different forms of feature map aggregation, max pooling and average pooling, gives better performance than either alone. Third, we observe that residual feature reuse in this setting propagates information more effectively between the layers, and makes the network easier to train. Our architecture achieves state-of-the-art results on the task of point cloud classification, as demonstrated on the ModelNet40 dataset, and an extremely competitive performance on the ShapeNet part segmentation challenge

    Optimizing Vision Transformers for Medical Image Segmentation

    Full text link
    For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projections and feature processing which lack spatial and local context to refine organ boundaries. Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. This is mainly achieved by our well-designed Convolutional Swin Transformer (CST) block which merges convolutions with Multi-Head Self-Attention and Feed-Forward Networks for providing inherent localized spatial context and inductive biases. Experiments demonstrate CS-Unet without pre-training outperforms other counterparts by large margins on multi-organ and cardiac datasets with fewer parameters and achieves state-of-the-art performance. Our code is available at Github

    Continuous Interaction With a Smart Speaker via Low-Dimensional Embeddings of Dynamic Hand Pose

    Get PDF
    This paper presents a new continuous interaction strategy with visual feedback of hand pose and mid-air gesture recognition and control for a smart music speaker, which utilizes only 2 video frames to recognize gestures. Frame-based hand pose features from MediaPipe Hands, containing 21 landmarks, are embedded into a 2 dimensional pose space by an autoencoder. The corresponding space for interaction with the music content is created by embedding high-dimensional music track profiles to a compatible two-dimensional embedding. A PointNet-based model is then applied to classify gestures which are used to control the device interaction or explore music spaces. By jointly optimising the autoencoder with the classifier, we manage to learn a more useful embedding space for discriminating gestures. We demonstrate the functionality of the system with experienced users selecting different musical moods by varying their hand pose

    Survey: Leakage and Privacy at Inference Time

    Get PDF
    Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research
    corecore